Goto

Collaborating Authors

 optimistic estimate


Linear Contextual Bandits with Knapsacks

Shipra Agrawal, Nikhil Devanur

Neural Information Processing Systems

In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm.



SubROC: AUC-Based Discovery of Exceptional Subgroup Performance for Binary Classifiers

Siegl, Tom, Coşkun, Kutalmış, Hiller, Bjarne C., Mirzaei, Amin, Lemmerich, Florian, Becker, Martin

arXiv.org Artificial Intelligence

Machine learning (ML) is increasingly employed in real-world applications like medicine or economics, thus, potentially affecting large populations. However, ML models often do not perform homogeneously, leading to underperformance or, conversely, unusually high performance in certain subgroups (e.g., sex=female AND marital_status=married). Identifying such subgroups can support practical decisions on which subpopulation a model is safe to deploy or where more training data is required. However, an efficient and coherent framework for effective search is missing. Consequently, we introduce SubROC, an open-source, easy-to-use framework based on Exceptional Model Mining for reliably and efficiently finding strengths and weaknesses of classification models in the form of interpretable population subgroups. SubROC incorporates common evaluation measures (ROC and PR AUC), efficient search space pruning for fast exhaustive subgroup search, control for class imbalance, adjustment for redundant patterns, and significance testing. We illustrate the practical benefits of SubROC in case studies as well as in comparative analyses across multiple datasets.



No-Regret Learning for Stackelberg Equilibrium Computation in Newsvendor Pricing Games

Liu, Larkin, Rong, Yuming

arXiv.org Artificial Intelligence

We introduce the application of online learning in a Stackelberg game pertaining to a system with two learning agents in a dyadic exchange network, consisting of a supplier and retailer, specifically where the parameters of the demand function are unknown. In this game, the supplier is the first-moving leader, and must determine the optimal wholesale price of the product. Subsequently, the retailer who is the follower, must determine both the optimal procurement amount and selling price of the product. In the perfect information setting, this is known as the classical price-setting Newsvendor problem, and we prove the existence of a unique Stackelberg equilibrium when extending this to a two-player pricing game. In the framework of online learning, the parameters of the reward function for both the follower and leader must be learned, under the assumption that the follower will best respond with optimism under uncertainty. A novel algorithm based on contextual linear bandits with a measurable uncertainty set is used to provide a confidence bound on the parameters of the stochastic demand. Consequently, optimal finite time regret bounds on the Stackelberg regret, along with convergence guarantees to an approximate Stackelberg equilibrium, are provided.


Linear Contextual Bandits with Knapsacks

Neural Information Processing Systems

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn't exceed the budget for each resource. The objective is once again to maximize the total reward. This problem turns out to be a common generalization of classic linear contextual bandits (linContextual) [8, 11, 1], bandits with knapsacks (BwK) [3, 9], and the online stochastic packing problem (OSPP) [4, 14].


Clustered Linear Contextual Bandits with Knapsacks

Deng, Yichuan, Mamakos, Michalis, Song, Zhao

arXiv.org Artificial Intelligence

In this work, we study clustered contextual bandits where rewards and resource consumption are the outcomes of cluster-specific linear models. The arms are divided in clusters, with the cluster memberships being unknown to an algorithm. Pulling an arm in a time period results in a reward and in consumption for each one of multiple resources, and with the total consumption of any resource exceeding a constraint implying the termination of the algorithm. Thus, maximizing the total reward requires learning not only models about the reward and the resource consumption, but also cluster memberships. We provide an algorithm that achieves regret sublinear in the number of time periods, without requiring access to all of the arms. In particular, we show that it suffices to perform clustering only once to a randomly selected subset of the arms. To achieve this result, we provide a sophisticated combination of techniques from the literature of econometrics and of bandits with constraints.


Optimistic Estimate Uncovers the Potential of Nonlinear Models

Zhang, Yaoyu, Zhang, Zhongwang, Zhang, Leyang, Bai, Zhiwei, Luo, Tao, Xu, Zhi-Qin John

arXiv.org Artificial Intelligence

We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models. It yields an optimistic sample size that quantifies the smallest possible sample size to fit/recover a target function using a nonlinear model. We estimate the optimistic sample sizes for matrix factorization models, deep models, and deep neural networks (DNNs) with fully-connected or convolutional architecture. For each nonlinear model, our estimates predict a specific subset of targets that can be fitted at overparameterization, which are confirmed by our experiments. Our optimistic estimate reveals two special properties of the DNN models -- free expressiveness in width and costly expressiveness in connection. These properties suggest the following architecture design principles of DNNs: (i) feel free to add neurons/kernels; (ii) restrain from connecting neurons. Overall, our optimistic estimate theoretically unveils the vast potential of nonlinear models in fitting at overparameterization. Based on this framework, we anticipate gaining a deeper understanding of how and why numerous nonlinear models such as DNNs can effectively realize their potential in practice in the near future.


What is the state of the art? Accounting for multiplicity in machine learning benchmark performance

Møllersen, Kajsa, Holsbø, Einar

arXiv.org Artificial Intelligence

Machine learning methods are commonly evaluated and compared by their performance on data sets from public repositories. This allows for multiple methods, oftentimes several thousands, to be evaluated under identical conditions and across time. The highest ranked performance on a problem is referred to as state-of-the-art (SOTA) performance, and is used, among other things, as a reference point for publication of new methods. Using the highest-ranked performance as an estimate for SOTA is a biased estimator, giving overly optimistic results. The mechanisms at play are those of multiplicity, a topic that is well-studied in the context of multiple comparisons and multiple testing, but has, as far as the authors are aware of, been nearly absent from the discussion regarding SOTA estimates. The optimistic state-of-the-art estimate is used as a standard for evaluating new methods, and methods with substantial inferior results are easily overlooked. In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided. We demonstrate the impact of multiplicity through a simulated example with independent classifiers. We show how classifier dependency impacts the variance, but also that the impact is limited when the accuracy is high. Finally, we discuss a real-world example; a Kaggle competition from 2020.


Beyond Individual and Group Fairness

Awasthi, Pranjal, Cortes, Corinna, Mansour, Yishay, Mohri, Mehryar

arXiv.org Machine Learning

Learning algorithms trained on large amounts of data are increasingly adopted in applications with significant individual and social consequences such as selecting loan applicants, filtering resumes of job applicants, estimating the likelihood for a defendant to commit future crimes, or deciding where to deploy police officers. Analyzing the risk of bias in these systems is therefore crucial. In fact, that is also critical for seemingly less socially consequential applications such as ads placement, recommendation systems, speech recognition, and many other common applications of machine learning. Such biases can appear due to the way the training data has been collected, due to an improper choice of the loss function optimized, or as a result of some other algorithmic choices.